Our data
To illustrate making graphs, we need some data.
Data on 202 male and female athletes at the Australian Institute of Sport.
Variables:
categorical: Sex of athlete, sport they play
quantitative: height (cm), weight (kg), lean body mass, red and white blood cell counts, haematocrit and haemoglobin (blood), ferritin concentration, body mass index, percent body fat.
Values separated by tabs (which impacts reading in).
Packages for this section
Reading data into R
Use read_tsv (“tab-separated values”), like read_csv.
Data in ais.txt:
my_url <- "http://ritsokiguess.site/datafiles/ais.txt"
athletes <- read_tsv (my_url)
Types of graph
Depends on number and type of variables:
1
0
bar chart
0
1
histogram
2
0
grouped bar charts
1
1
side-by-side boxplots
0
2
scatterplot
2
1
grouped boxplots
1
2
scatterplot with points identified by group (eg. by colour)
With more (categorical) variables, might want separate plots by groups . This is called facetting in R.
ggplot
R has a standard graphing procedure ggplot, that we use for all our graphs.
Use in different ways to get precise graph we want.
Let’s start with bar chart of the sports played by the athletes.
Bar chart
ggplot (athletes, aes (x = Sport)) + geom_bar ()
Histogram of body mass index
ggplot (athletes, aes (x = BMI)) + geom_histogram (bins = 10 )
Which sports are played by males and females?
Grouped bar chart:
ggplot (athletes, aes (x = Sport, fill = Sex)) +
geom_bar (position = "dodge" )
BMI by gender
ggplot (athletes, aes (x = Sex, y = BMI)) + geom_boxplot ()
Height vs. weight
Scatterplot:
ggplot (athletes, aes (x = Ht, y = Wt)) + geom_point ()
With regression line
ggplot (athletes, aes (x = Ht, y = Wt)) +
geom_point () + geom_smooth (method = "lm" )
BMI by sport and gender
ggplot (athletes, aes (y = Sport, x = BMI, fill = Sex)) +
geom_boxplot ()
A variation that is colour-blind-friendly:
library (RColorBrewer)
ggplot (athletes, aes (colour = Sport, y = BMI, x = Sex)) +
geom_boxplot () + scale_color_brewer (palette = "Set3" )
A variation that uses fill instead of colour:
ggplot (athletes, aes (x = Sport, y = BMI, fill = Sex)) +
geom_boxplot ()
Height and weight by gender
ggplot (athletes, aes (x = Ht, y = Wt, colour = Sex)) +
geom_point ()
Height by weight by gender for each sport, with facets
ggplot (athletes, aes (x = Ht, y = Wt, colour = Sex)) +
geom_point () + facet_wrap (~ Sport)
Filling each facet
Default uses same scale for each facet. To use different scales for each facet, this:
ggplot (athletes, aes (x = Ht, y = Wt, colour = Sex)) +
geom_point () + facet_wrap (~ Sport, scales = "free" )
Another view of height vs weight
ggplot (athletes, aes (x = Ht, y = Wt)) +
geom_point () + facet_wrap (~ Sex)